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Abstract 

Recent research indicates that perceptual learning (PL) — experience-induced changes 
in the way perceivers extract information — plays a larger role in complex cognitive 
tasks, including abstract and symbolic domains, than has been understood in theory 
or implemented in instruction. Here, we describe the involvement of PL in complex 
cognitive tasks and why these connections, along with contemporary experimental 
and neuroscientific research in perception, challenge widely held accounts of the 
relationships among perception, cognition, and learning. We outline three revisions to 
common assumptions about these relations: 1) Perceptual mechanisms provide 
complex and abstract descriptions of reality; 2) Perceptual representations are often 
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amodal, not limited to modality-specific sensory features; and 3) Perception is selec- 
tive. These three properties enable relations between perception and cognition that 
are both synergistic and dynamic, and they make possible PL processes that adapt 
information extraction to optimize task performance. While PL is pervasive in natural 
learning and in expertise, it has largely been neglected in formal instruction. We 
describe an emerging PL technology that has already produced dramatic learning gains 
in a variety of academic and professional learning contexts, including mathematics, 
science, aviation, and medical learning. 


>>1. INTRODUCTION 

' On a good day, the best human chess grandmaster can defeat the 
world’s best chess-playing computer. Not every time, but sometimes. The 
computer program is relentless; every second, it examines upward of 200 
million possible moves. Its makers incorporate sophisticated methods for 
evaluating positions, and they implement strategies gotten from grandmaster 
consultants. Arrayed against these formidable techniques, it is surprising that 
any human can compete at aU. 

If, like the computer, humans played chess by searching through possible 
moves, pitting human versus computer would be pointless. Estimates of 
human search in chess suggest that even the best players examine on the 
order of four possible move sequences, each about four plies deep (where a 
ply is a pair of turns by the two sides). That estimate is per turn, not per second, 
and a single turn may take many seconds. If the computer were limited 
to 10 s of search per turn, its advantage over the human would be about 
1,999,999,984 moves searched per turn. 

Given this disparity, how can the human even compete? The accom- 
plishment suggests information-processing abilities of remarkable power 
but mysterious nature. Whatever the human is doing, it is, at its best, roughly 
equivalent to 2 billion moves per second of raw search. It would not be 
overstating to describe such abilities as “magical.” 

We have not yet said what abilities make this possible, but before doing 
so, we add another observation. Biological systems often display remark- 
able structures and capacities that have emerged as evolutionary adaptations 
to serve particular functions. Compared to flying machines that humans 
have invented, the capabilities of a dragonfly, hummingbird, or mosquito are 
astonishing. Yet, unlike anatomical and physiological adaptations for move- 
ment, the information-processing capabilities we are considering are all the 
more remarkable because it is unlikely that they evolved for one particular 
task. We did not evolve to play chess. What explains human attainments 
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in chess are highly general abilities that contribute to learned expertise in 
many domains. Such abilities may have evolved for ecologically important 
tasks, but they have such power and generality that humans can become 
remarkably good in almost any domain involving complex structure. 

What abilities are these? They are abilities of perceptual learning (PL) . 
The effects we are describing arise from experience-induced changes in the 
way perceivers pick up information. With practice in any domain, humans 
become attuned to the relevant features and structural relations that define 
important classifications, and over time, we come to extract these with 
increasing selectivity and fluency. 

The existence of PL and its pervasive role in learning and expertise say 
something deeply important about the way human intelligence works. What 
it says violates common conceptions that view perception and learning as 
separate and nonoverlapping processes. It is common to think of perception 
as delivering basic information in a relatively unchanging way. According 
to this view, high-level learning happens elsewhere — in committing facts 
to memory, acquiring procedures, or generating more complex or abstract 
products from raw perceptual inputs by means of reasoning processes. Con- 
temporary experimental and neuroscientific research in perception, as well 
as new discoveries in PL, require revision of these assumptions in at least 
three ways: 1) perceptual mechanisms provide complex and abstract descrip- 
tions of reality, overlapping and interacting deeply with what have tradi- 
tionally been considered “higher” cognitive functions; 2) the representations 
generated by these perceptual mechanisms are not limited to low-level 
sensory features bound to separate sensory modalities; and 3) what percep- 
tion delivers is not fixed, but progressively changing and adaptive. 

We return to the first two ideas later on, but consider now what is 
implied by the third idea, the idea of PL. We can understand the adaptive 
nature of our perceptual abilities by way of contrast. Suppose we developed 
a set of algorithms in a computer vision system to recognize faces. The 
system would be structured to take input through a camera and perform 
certain computations on that input. If it worked properly, when we used 
the system for the thousandth time, it would carry out these computations 
in the same way as it did its first time. It is natural to think of a perceiving 
system as set up to acquire certain inputs and perform certain computations, 
ultimately delivering certain outputs. 

Our brains do not work this way. If recognizing faces is the task, the 
brain will leverage ongoing experience to discover which features and pat- 
terns make a difference for important face classifications. Over time, the 
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system will become selectively attuned to extract this information and take 
it in in bigger chunks. (This is true even for perceptual abilities which, like 
face perception, likely have innate foundations.) With appropriate practice, 
this information extraction will become faster and more automatic. The 
automatization of basic information pickup paves the way for the discovery 
of even more complex relations and finer detail, which in turn becomes pro- 
gressively easier to process (Bryan & Harter, 1899). This cyclic process can 
be a positive feedback loop: Improvements in information extraction lead 
to even more improvements in information extraction. The resulting abili- 
ties to see at a glance what is relevant, to discern complex patterns and finer 
details, and to do so with minimal cognitive load, are hallmarks of expertise 
in aU domains where humans attain remarkable levels of performance. 

It is likely that this type of learning comprises a much bigger part of 
the learning task in many domains than has been understood in theoretical 
discussions of learning or implemented in methods of instruction. What 
is being discovered about PL has implications for learning and instruction 
that parallel what researchers in artificial intelligence have discovered, “that, 
contrary to traditional assumptions, high-level reasoning requires very little 
computation, but low-level sensorimotor skills require enormous compu- 
tational resources” (http://en.wikipedia.org/wiki/Moravecs_paradox). An 
artificial intelligence researcher, Hans Moravec, elaborated this idea in what 
has come to be known as “Moravec’s Paradox” (Moravec, 1988): 

Encoded in the large, highly evolved sensory and motor portions of the human 
brain Is a billion years of experience about the nature of the world and how to 
survive In It. The deliberate process we call reasoning Is, I believe, the thinnest 
veneer of human thought, effective only because It is supported by this much older 
and much more powerful, though usually unconscious, sensorimotor knowledge. 

\Ne are all prodigious Olympians in perceptual and motor areas, so good that 
we make the difficult look easy. Abstract thought, though. Is a new trick, perhaps 
less than 1 00 thousand years old. We have not yet mastered it It Is not all that 
Intrinsically difficult; it just seems so when we do it 

In what follows, we will add elaborations oftw-o kinds to Moravec’s Paradox. 
First, our Olympian perceptual abilities are astounding because they give us 
access to a great many of the abstract relations that underlie thought and 
action. “Sensorimotor knowledge” does not convey the scope and power of 
what perceptual mechanisms deliver. Not only is explicit abstract thinking 
possibly a newer evolutionary acquisition, but the work of abstraction is 
not exclusively the province of thinking processes alone. Much of thinking 
turns out to be seeing, if seeing is properly understood. 
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The second elaboration is that the evolutionary heritage that makes 
us perceptual Olympians involves not only fixed routines, but perceptual 
systems that change — that attune, adapt, and discover to optimize learning, 
problem solving, and complex task performance. These changes comprise 
a much larger component of learning and expertise than is usually under- 
stood in learning research. Such an understanding of PL has been even 
more conspicuously missing from the efforts to improve school learning 
and other formal instructional efforts. 

In this chapter, we describe recent work on PL, with a particular focus 
on its relation to complex cognitive tasks. One important goal is to describe 
how PL relates to perception, cognition, and learning. Some of the domains 
in which we apply PL, such as mathematics, will seem distant from per- 
ception to many readers. Thus, the theoretical underpinnings of the effort 
deserve to be spelled out, and doing so may facilitate the understanding 
of current efforts and continuing progress in these areas. Making the basic 
connections here is important because the emerging understanding of PL 
has broad implications throughout the cognitive and neural sciences. Both 
understanding PL, and using it to improve learning, depend on coherent 
accounts of the relation between perception, cognition, and learning. A 
second aim of this chapter, building on the first, is to describe an emerg- 
ing technology of PL that has many applications and offers the potential 
to address missing dimensions of learning and accelerate the growth of 
expertise in many domains. A large and growing research literature suggests 
that PL effects are pervasive in perception and learning, and that they pro- 
foundly affect tasks from the pickup of minute sensory detail to the extrac- 
tion of complex and abstract relations in complex cognitive tasks. PL thus 
furnishes a crucial basis of human expertise, from accomplishments as com- 
monplace as skilled reading to those as ratified as expert medical diagnosis, 
mathematical expertise, grandmaster chess, and creative scientific insight. 

The article is organized as follows: In the next section, we consider the 
information-processing changes that are produced by PL. These have most 
often been examined in tasks that involve either low-level sensory discrimi- 
nations or real-world tasks that obviously depend on perceptual discrimina- 
tion (e.g. detecting pathology in radiologic images). Using the example of 
PL in mathematics learning. Section 3 extends PL to higher level symbolic 
cognitive tasks, in which PL has seldom been considered. Understanding 
the role of PL in such tasks requires a revised account of the relations of 
perception, cognition, and learning. In Section 4, we argue that the com- 
mon conceptions of these processes and their relations do not provide a 
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satisfactory foundation for understanding high-level PL effects, primarily 
because they are based on outdated ideas about perception. Drawing on 
more recent views, we describe a framework for understanding PL compo- 


character of perception itself. With this framework in hand, we consider 
more fully in SectionV the applications of PL to instruction. 


Perceptual learning refers to experience-induced improvements in 
the pickup of information (E. Gibson, 1969). The fundamental observation 
is that perceptual pickup is not a static process. After an intensive period 
of research in the 1960s and a somewhat dormant period for two decades 
afterward, PL has become an area of concentrated focus in the cognitive 
and neural sciences. The relative neglect and occasional focus on PL in the 
history of learning research and its recent emergence have been described 
elsewhere, as have issues of modeling PL and understanding its neural bases 
(for a review, see KeUman & Garrigan, 2009) . Another important question 
has been the relation between simple laboratory tasks involving PL and 
more complex, real-world tasks typically involving the extraction of invari- 
ance amidst variation; recent work suggests that aU of these tasks partake of 
a unified learning process in which the discovery of relevant information 
and its selective extraction are key notions (Ahissar, Laiwand, Kozminsky, 
& Hochstein, 1998; Garrigan & KeUman, 2008; Li, Levi, & Klein, 2004; 
MoUon & Danilova, 1996; Petrov, Dosher, & Lu, 2005; Zhang et ah, 2010). 
In the present discussion, we build on this recent work but do not revisit it. 
Here, we focus on the range of effects produced by the PL, before turning 
to more general issues of how these relate to basic notions of perception, 
cognition, and learning. 

A wealth of research now supports the notion that, with appropriate 
practice, the brain progressively configures information extraction in any 
domain to optimize task performance. What are the changes involved? 
The list involves a variety of distinguishable effects that serve to improve 
performance. Kellman (2002) argued that these effects faU into two 
categories: discovery and fluency effects. Discovery effects involve find- 
ing what information is relevant to a domain or classification. Fluency 
effects involve coming to extract information with greater ease, speed, 
or reduced cognitive load. Table 4.1 summarizes some of the changes 
between novices and experts that occur from PL. Discovery effects 


nents of high-level cognitive tasks that is rooted in the amodal and abstract 
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Table 4.1 Some Characteristics of Expert and Novice Information Extraction. 
Discovery effects involve learning and selectively extracting features or relations 
that are relevant to a task or classification. Fluency effects involve learning to extract 
relevant information faster and with lower attentional load (see text). 



Novice 

Expert 

Discovery Effects 

Selectivity 

Attention to irrelevant 
and relevant informa- 
tion 

Selective pickup of 
relevant information 
Filtering/inhibition of 
irrelevant information 

Units 

Simple features 

Larger chunks 
Higher-order relations 

Fluency Effects 

Search type: 

Serial processing 

Increased parallel 
processing 

Attentional load: 

High 

Low 

Speed: 

Slow 

Fast 


include the fundamental idea of selection (Gibson, 1969; Petrov et ah, 
2005): We discover and pick up the information relevant to a task or 
classification, ignoring, or perhaps inhibiting (Kim, Imai, Sasaki, & Wata- 
nabe, 2012;Wang, Cavanagh & Green, 1994) available information that is 
irrelevant. We come to extract information in larger chunks, forming and 
processing higher-level units (Chase & Simon, 1973; Goldstone, 2000). 
Most profoundly (and mysteriously) , we come to discover new and often 
complex relationships in the available information to which we were 
initially insensitive (Chase & Simon, 1973; KeUman, 2002). These discov- 
ery processes are pervasive in early learning. When a child learns what a 
dog, toy, or truck is, this kind of learning is at work. From a number of 
instances, the child extracts relevant features and relations. These allow 
later recognition of previously seen instances, but more important, even a 
very young child quickly becomes able to categorize new instances. Such 
success implies that the learner has discovered the relevant characteris- 
tics or relations that determine the classification. As each new instance 
will differ from previous ones, learning also includes the ignoring of 
irrelevant differences. 

Fluency effects refer to changes in the efficiency of information extrac- 
tion. Practice in classifying leads to fluent and ultimately automatic process- 
ing (Schneider & Shiffrin, 1977), where automaticity in PL is defined as the 
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ability to pick up information with little or no sensitivity to attentional load. 
As a consequence, perceptual expertise may lead to more parallel processing 
and faster pickup of information. 

The distinction between discovery and fluency effects is not always per- 
fectly clear. For example, becoming selective in the use of information (a 
discovery effect) increases efficiency and improves speed (fluency effects). 
It does seem, however, that there are clear cases of each category. In one of 
the earliest relevant studies, Bryan and Harter (1899) reported that telegraph 
operators learning to receive Morse code reached plateaus in performance, 
but that continuing practice while at a plateau appeared to pave the way for 
substantial new gains in performance.Their interpretation is that the eventual 
improvements in performance came from automaticity — operators coming 
to extract the same information with less cognitive load, ultimately enabling 
them to discover more complex relations in the input. This interpreta- 
tion is consistent with a relatively pure fluency improvement, that is, with 
practice at a certain point not changing the information being extracted, 
but allowing its extraction with reduced attentional load (Schneider & 
Shiffrin, 1977). The continuing cycle of discovery and fluency described by 
Bryan and Harter — discovery leading to improved performance, followed 
by improved fluency, leading in turn to higher level discovery — may be 
the driver of remarkable attainments of human expertise in many complex 
tasks. 



PERCEPTUAL LEARNING IN MATHEMATICS: AN 
EXAMPLE 


There is a common view about the relation of perception and 
cognition. In a hierarchy of cognitive processes, perception is typically 
considered “low-level,” where “higher” cognitive processes encompass 
categorization, thinking, reasoning, etc. Eleanor Gibson, who pioneered 
the field of PL, thought of it as a pervasive contributor to expertise, giv- 
ing examples as varied as chick sexing, wine tasting, map reading. X-ray 
interpretation, sonar interpretation, and landing an aircraft. Even these 
examples, however, are mostly confined to tasks where the major task 
component is classifying perceptual inputs based on subtle kinds of infor- 
mation. For most of these examples, one might still maintain a notion of 
perception as handing off results of basic feature detection, which then 
become the raw material for conceptual analysis, cognitive inferences, and 
high-level thinking. 
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Recent work, however, indicates that PL is strongly involved even 
in very high-level cognitive domains, such as the learning and under- 
standing of mathematics (KeUman, Massey, & Son, 2009: Landy & 
Goldstone, 2007). Learning in these domains involves a variety of cog- 
nitive processes, but attaining expertise depends substantially on pat- 
tern recognition and fluent processing of structure, as well as mapping 
across transformations (e.g. in algebra) and across multiple representa- 
tions (e.g. graphs and equations). In fact, given conventional instruction, 
the PL components of expertise may be disproportionately responsible 
for students’ difficulties in learning (Kellman et ah, 2009). Although 
this research area is relatively new, findings indicate that even short PL 
interventions can accelerate the fluent use of structure, in contexts such 
as the mapping between graphs and equations (Kellman et ah, 2008; 
Silva & Kellman, 1999), apprehending molecular structure in chemistry 
(Wise, Kubose, Chang, Russell, & Kellman, 2000), processing algebraic 
transformations, and understanding fractions and proportional reasoning 
(Kellman et ah, 2009). 

The structures and relations that are relevant to PL in these domains 
are more abstract and complex than what we normally think of as being 
processed perceptually. As an example, Kellman et al. (2009) studied 
algebra learning using a perceptual learning module (PLM) designed 
to address the seeing of structure in algebra. Participants were 8th and 
9th graders at midyear in Algebra I courses. Students at this point in 
their learning show a characteristic pattern. Given simple equations to 
solve, such x+4 = 12, accuracy is high, with an average across partici- 
pants of around 80% correct solutions. Remarkably, however, students 
at this stage take an average of about 28 s per problem! This pattern 
suggests that conventional instruction does a good job of addressing 
the declarative and procedural aspects of solving algebraic equations. 
Students know they should “get x alone on one side,” and “do the same 
operation to both sides of the equation,” and they were able to accom- 
plish these goals with high accuracy. Their response times, however, sug- 
gest that we may underestimate the seeing problem in algebra learning. 
Someone with much greater experience looks at x + 4 = 12 and sees the 
answer at a glance. This kind of ability can reach higher and higher levels, 
supporting greater expertise, as illustrated in this example: 
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Given that this is a single equation with two unknowns, one might think 
at first glance that the problem does not permit a numerical solution for 
fx, but a more practiced observer may easily see that the equation permits 
easy simplification, and /a = 1 . In this case, even the relative unfamiliarity of 
the symbols used may make the seeing problem harder. Without changing 
anything mathematical, compare 

(4x — 2xy) 
m — . 

(2 - y) (2x) 

If this equation stiU has you reaching for pen and paper, seeing the structure 
may be better illustrated in this simpler version: 

(x — xy) 

m— . 

W (1 -y) 

These examples all involve the distributive property of multiplication 
over addition. However, being able to enunciate this property would not 
produce fluent recognition of the distributive structure. Conceptually, and 
even computationally, these examples are all very similar, but you may 
have noticed the relevant structure more easily in one case than another. 
Improved encoding of relevant structure and potential transformations in 
equations is a likely result of PL, one that is difficult to address in conven- 
tional instruction. 

Following this kind of intuition, we developed our Algebraic Transforma- 
tions PLM in order to apply PL methods to improve students’ pattern pro- 
cessing and fluency in algebra. We developed a classification task in which 
participants viewed a target equation or expression and made speeded judg- 
ments about which one of a set of possible choices represented an equivalent 
equation or expression, produced by a valid algebraic transformation. A key 
goal of this PLM was to contrast the declarative knowledge components 
(facts and concepts that can be verbalized) with the idea of “seeing” in alge- 
bra. The goal was to get students to see the structure of expressions and 
equations, and relations among them, in order to use transformations fluently. 

In the Algebraic Transformations PLM, we did not ask students to solve 
problems. Instead, we devised a classification task that exercised the extrac- 
tion of structure and the seeing of transformations. On each trial, an equa- 
tion appeared, and the student had to choose which one of several options 
below was a legal transformation. An example is shown in Figure 4.1. In 
addition to testing whether practice in the PLM improved accuracy and 
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fluency in recognizing transformations, we also examined whether students 
would be able to transfer learning to solving algebraic equations. 

This study was carried out with forty-two 8th and 9th grade students at 
midyear in an Algebra 1 course. Students participated in two 40-min learn- 
ing sessions using the Algebraic Transformations PLM. On each trial, they were 
shown a target equation and were asked to select which of four choices 
could be correctly derived by performing a legal algebraic transforma- 
tion on the target. Students were given feedback after each trial indicating 
whether or not they had chosen the correct answer. Incorrect answers were 
followed by an interactive feedback screen in which students’ attention was 
focused on the relevant transformation. 

The task that formed the core of the PLM — matching an equation to a 
valid transformation — is directly useful to development of pattern recognition 
and skin in algebra. The PLM produced dramatic gains for virtually aU stu- 
dents on this task, with accuracy changing from about 57% on initial learning 
trials to about 85% at the end of PLM usage, and response times per problem 
reduced by about 55%, from nearly 12 s per problem to about 7 s, suggesting 
the development of fluency in processing symbolic structure of equations. 

Perhaps more remarkable was the transfer to actual algebra problem 
solving. Although students did not receive any practice in solving equa- 
tions during the learning phase, the relatively brief intervention aimed 
at seeing transformations produced a dramatic reduction in the post-test 
equation solving time — from about 28 s per problem to about 12.5 s per 
problem (Figure 4.2, right panel). A delayed post-test showed that these 
gains were lasting: The average solving time was actually slightly faster than 
in the immediate post-test when tested after a 2-week interval. There was 


6y + 5x - 20 = 43 


A 

6y - 20 = 43 + 5x 

B 

6y - 20 = 43 (-5x) 

C 

6y - 20 = 43 - 5x 

D 

6y - 20 = 43 - X - 5 


Figure 4.1 Example of a Problem Display in the Algebraic Transformations PLM 

(see text). 
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also some indication that accuracy in equation solving, already high at pre- 
test, received some benefit in the delayed post-test (Figure 4.2, left panel). 

The idea that mathematical understanding has an important PL com- 
ponent may seem counterintuitive, for many reasons. If perception is about 
properties such as brightness, color, the orientation of edges, or even the 
locations of objects and surfaces, how is this relevant to a mathematics class? 
These perceptual contents might at best serve up the occasional concrete 
example, but they hardly encompass mathematical ideas. On traditional 
views, most of mathematical thinking, and the instructional methods used 
to teach it, involve declarative knowledge and procedures. Perception may 
serve the banal role of allowing the student to see the markings on the 
chalkboard, but the processing of mathematical ideas must surely be farther 
up in the cognitive hierarchy! There would seem to be a gap between the 
basic and concrete information furnished by the senses and the abstract 
conceptual content of mathematics. The simple difference between the 
level or types of information that perception is presumed to furnish and 
what is required for abstract thinking seems a formidable obstacle to the 
kind of connection we are making here. 

But it is not the only obstacle. Mathematics has inherently symbolic 
aspects. The symbols in an equation have an arbitrary relation to the ideas 
they represent. Unlike the functional properties of objects and events in 
the world, the meanings of mathematical ideas would seem remote from 
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Figure 4.2 Results of Algebraic Transformations PLM Study for the Transfer Task 
of Solving Algebraic Equations. Data for pretest, post-test, and delayed post-test are 
shown for accuracy (left panel) and response time (right panel). Error bars indicate ±1 
standard error of the mean (Adapted from Kellman, Massey & Son, TopICS In Cognitive 
Science, 2009; Cognitive Science Society, Inc., p. 14). For a color version of this figure, the 
reader is referred to the online version of this book. 
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stimulus information reaching perceptual systems. Moreover, much of the 
expertise conferred by PL may be implicit (e.g. try describing to a stranger 
how you recognize your sister’s voice on the telephone), whereas math- 
ematics is in many respects an extremely explicit discipline. Steps must be 
justified and proofs must be offered. Even assuming the relevance of PL to 
complex tasks, one might stiU wonder about the application to symbolic, 
explicit domains such as mathematics. 

Many of these objections have straightforward answers. Even if they 
involve symbolic content, mathematical representations pose important 
information extraction requirements and challenges. Characteristic dif- 
ficulties in mathematics learning may directly involve issues of discovery 
and fluency aspects of PL. A number of studies indicate the role of PL 
in complex cognitive domains, such as mathematics (KeUman et ah, 2009; 
Landy & Goldstone, 2007; Silva & KeUman, 1999), language or language- 
like domains (Gomez & Gerken, 1999; Reber, 1993; Reber & Allen, 1978; 
Saffran, Aslin, & Newport, 1996), chess (Chase & Simon, 1973), and read- 
ing (Baron, 1978; Reicher, 1969; Wheeler, 1970). Some have asserted that 
in general, abstract concepts have crucial perceptual foundations (Barsalou, 
1999; Goldstone, Landy, & Son, 2008; Prinz, 2004). 

The extensive use of tangible representations in mathematics, science, and 
other abstract conceptual domains is also a bit of a giveaway. Hardly two steps 
into considering a compUcated problem in mathematics, science, economics, 
or other quantitative discipUnes we construct a graph or a diagram, if not 
several. One’s facility in dealing with these representations obviously changes 
with experience, in obscure ways that go beyond being able to explain the 
basics of how the diagram represents information. We seem to grapple with 
complex ideas in mathematics and science by using spatial, configural, and 
sometimes temporal structures (i.e. simulations) that draw on representa- 
tional capacities rooted in our perceptions of spatial and temporal structure 
in the world. A graph of the change of world temperature over time is a 
spatial object, and the patterns therein are comprehended by grasping spatial 
relations, although neither temperature nor time is a spatial notion. Reliably 
accompanying the use of these structures and representations are powerful, 
general capacities to learn to detect relations and become able to fluently 
select information that is important within a domain: Perceptual learning. 

StiU, we are stuck with the first objection. Perception, as commonly 
understood, just seems to be at the wrong level for explaining comprehen- 
sion in mathematics. Maybe the connection is intended as some of kind 
of metaphor. If one conceives of perception as consisting of separate sense 
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modalities, then what we obtain through vision must somehow be built from 
sensory experiences of brightness and color. In audition, we are presumably 
extracting sequences and combinations of loudness and pitch. In algebra 
class, one should listen to the teacher s voice and look at the blackboard, but 
surely algebra is not about arrays of color, brightness, loudness, or pitch. 

Later in this chapter, we will have more to say about PL technology and 
the potential for radically improving learning by integrating methods that 
accelerate PL with conventional instruction. For now, however, we focus on 
what appears to be most perplexing in our example of PL in complex cog- 
nitive domains. If it is surprising that changing the perceiver can be the key 
to advancement in domains such as mathematics, it is because there is work 
to do in clarifying the relation of perception to learning and cognition. This 
is the focus of the next section. 


Continuing scientific progress and practical applications of PL will be 
facilitated by a better understanding of the relations between perception, 
cognition, and learning. One might assume that these relations are well 
understood, but in fact they are not. A primary reason is that progress in 
understanding perception in the past several decades necessitates a rethink- 
ing of some of these relations, invalidating some ways of thinking and pav- 
ing the way for new insights. 

As we mentioned above, commonly held views of perception would sug- 
gest that the products of perceiving are too low level to have consequences for 
abstract thinking and learning. Thus, before the last few years, if someone sug- 
gested a role for perception in learning mathematics, it would involve using 
shaded diagrams to illustrate fraction concepts or manipulatives that might 
allow learners to have some concrete realization of adding numbers. These 
applications are quite different from the idea of a general learning mechanism 
by which learners progressively change the way they extract structure and 
relations from symbolic equations, or gain competency in mapping structure 
across differing mathematical representations, or come to selectively attend 
to important relations, rather than irrelevancies, in a measurement problem. 

In recent years, there have been trends in cognitive science arguing 
for a close relation between perception and cognition. This work includes 
empirical findings that implicate perceptual structure as being involved in 
processing abstract ideas (Landy & Goldstone, 2007) and other research 
indicating modal sensory activations accompanying cognitive tasks such as 
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sentence verification (van Dantzig, Pecher, Zeelenberg, & Barsalou, 2008). 
There have heen accompanying theoretical proposals that suggest that 
high-level cognition depends fundamentally on perception, including the 
ideas of perceptual symbol systems (PSS; Barsalou, 1999) and the notion 
of embodied cognition. We believe that these accounts share important 
elements, and aU are an improvement on an earlier, implicit general view of 
perception being detached from thinking. 

In our view, however, none of these efforts provides a suitable basis for 
understanding the relation of perception and PL to the rest of cognition 
and to complex learning domains. As a result, the situation is confusing. We 
have found this to be especially troubling in terms of connecting emerging 
findings in PL and PL technology in instruction with conventional ideas 
of cognition, teaching, and learning. The reason is that neither the older 
assumptions about how these relate nor most recent proposals in cognitive 
psychology provide a coherent basis for understanding the relation of PL 
to cognition in general. We briefly discuss some of these views and their 
problems before describing a more coherent, as well as simpler, account, one 
grounded in a contemporary understanding of perception. 

4.1 . The Classical View of Perception 

In classical empiricist theories of perception and perceptual development, 
widely shared for several centuries by many philosophers and psychologists, 
all meaningful perception (e.g. perception of objects, motion, and spatial 
arrangement) was held to arise from initially meaningless sensations. Meaning- 
ful perception was thought to derive from associations among sensations (e.g. 
Berkeley, 1709/1910; Locke, 1690/1971; Titchener, 1902) and with action 
(Piaget, 1952). In this view, all of perception is essentially a cognitive act, con- 
structing meaning by associating sensations and connecting them with previ- 
ously remembered sensations. A modern version of this view, widely shared 
in cognitive psychology, is satirized in a famous information-processing dia- 
gram in Ulric Neisser’s book Cognition and Reality (Neisser, 1976), in which 
an input labeled “retinal image” is connected by arrows to successive boxes 
labeled “processing,” “more processing,” and “stiU more processing.” 

This view of perception came with its own view of PL. Essentially, on 
this view, all meaningful perception is a product of learning. Inferring the 
motion of an object from sensations encoded at different positions and 
times, or understanding the three-dimensional shape of an object by retriev- 
ing previously stored images gotten from different vantage points involve 
meaningless sensations combined with associative learning processes (e.g. 
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Locke, 1690/1971), or unconscious inference processes working on current 
and previously stored sensations (Helmholtz, 1864/1972). 

4.2. Perceptual Symbol Systems 

There have been clear trends among cognitive researchers to connect per- 
ception more closely to other cognitive processes or to uncover perceptual 
influences in cognitive tasks. Particularly influential has been the work of 
Barsalou on “perceptual symbol systems” (PSS). PSS comprise proposals to 
account for a number of important phenomena, including well-known dif- 
ficulties of specifying formal, context-free criteria of inclusion in concep- 
tual categories (e.g. what makes something a cat); the apparently dynamic, 
variable aspects of representations; and the engagement of cortical areas 
involved with perception during cognitive tasks. 

According to PSS, the idea of nonperceptual, abstract thought does not 
really exist. Even our most abstract ideas are attained by reference to stored 
perceptual encodings. As Barsalou (1999) explains, 

. . .abstract concepts are perceptual, being grounded in temporally extended 
simulations of external and Internal events. (Barsalou, 1 999, p. 603) 

Specifically, when we think of an apparently abstract idea, that process- 
ing consists of running a “simulation,” which consists of “re-enactment in 
modality-specific systems”: 

The basic Idea behind this mechanism is that association areas In the brain 
capture modality-specific states during perception and action, and then reinstate 
them later to represent knowledge. When a physical entity or event is perceived. It 
activates feature detectors in the relevant modality-specific areas. During visual 
processing of a car, for example, populations of neurons fire for edges, vertices and 
planar surfaces, whereas others fire for orientation, colour and movement. The 
total pattern of activation over this hierarchically organized distributed system 
represents the entity in visionfe.g. Zeki 1 993; Palmer, 1 999). Similar distributions of 
activation on other modalities represent how the entity feels and sounds, and the 
actions performed on It (Barsalou, 2003, p. 1 1 79) 

Barsalou contrasts this view with what he sees as the more typical view 
in cognitive science that information gotten through perception is “trans- 
duced” into amodal representations, where 

...an amodal symbol system transduces a subset of a perceptual state Into 
a completely new representation language that is inherently nonperceptual. 

(Barsalou, 1 999, p. 577) 

We believe that Barsalou and others have identified a key problem — 
a perceived disconnect between information processing involving most 
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cognitive processes and perception. The general idea that these are more 
closely coupled than often believed is consistent with considerable evidence 
and has opened up some important issues in these fields. 

4.3. Problems for Understanding Perceptual Learning 

Regrettably, both the classical view and more recent proposals about the 
relation between perception and cognition make poor foundations for 
understanding current approaches to and results of PL. Mathematics seems 
much more abstract than perception. Consider the applications of PL to 
mathematics that we described above. On the classical view, it is hard to 
relate the abstract structures in mathematics to the aggregates of sensations 
that are the harvest from perceiving. Mathematics seems to be the province 
of higher-level reasoning, not perception. 

The situation is somewhat reversed from Barsalou’s PSS view. Here, it 
is claimed that abstract ideas do not really exist off by themselves; what we 
think of as abstract thought really consists of activations of modality-specific 
features. On this account, all mathematics would be inherently perceptual. It 
is hard to see how it would be abstract, however. If the input contents are all 
modality specific, what is mathematics? Is mathematics visual? Is it auditory? 
Tactile? Mathematics does not really seem to be any of those things. From 
the PSS account, it could be argued that thinking about a mathematical 
idea involves running certain re-enactments of particular perceptual expe- 
riences. These are likely multimodal; they could have inputs from different 
modalities such as the sound of your teacher’s voice in algebra class or the 
chalkmarks on the blackboard, or the feel of your pencil in your hand. 
Thinking about particular ideas in particular contexts would involve re- 
enacting (simulating) different subsets of stored perceptual records. 

Both this approach and the classical approach have massive problems 
with abstraction and selection. Consider a student who is mastering the con- 
cept of slope in a PLM involving graphs, equations, and word problems. 
The student’s task is to map a problem represented in one format, such as 
a graph, to the same mathematical structure as it is expressed in either an 
equation or a word problem. The student learns to extract a general idea 
that applies to new contexts, as well as structural invariants specific to rep- 
resentational types (KeUman, Massey & Son, 2009). In a graphic represen- 
tation, the understanding of slope emerges as involving spatial directions: 
A positively sloped function increases in height from the left to the right; 
steeper increases show larger slopes, and so on. From mapping word prob- 
lems onto graphs, the deeper understanding emerges that a positive slope 
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involves increases in one quantity as another quantity increases. As water is 
heated, a rising temperature over time implies a positive slope. One could 
well recall an experience of boiling water when one thinks about slope, 
but that would not help without some mechanism of specifying which 
parts of that experience constitute slope. The slope concept can apply to 
boiling water but is not about boiling water. It has been argued that the 
PSS framework involves insurmountable problems in that rerunning vari- 
ous perceptual records provides no mechanism for selecting a particular 
idea (Landau, 1999; Ohlsson, 1999). The problem is especially severe when 
the idea is an abstract one, such as slope. Meaningful learning here would 
involve a student being able to apply slope to novel situations (e.g. knowing 
what it would mean if there were a negative slope relating number of busi- 
ness startups to interest rates) . It is hard to fathom how this understanding of 
a novel case could come from rerunning subsets of prior modality-specific 
activations. As Ohlsson (1999) puts it, 

A closely related difficulty for Barsalou's theory is that the instances of some 
concepts do not share any perceptible features. Consider furniture, tools, and 
energy sources. No perceptible feature recurs across all Instances of either of these 
categories. Hence, those concepts cannot be represented by combining parts of 
past percepts. (Ohlsson, 1999, p. 630) 

PL in complex cognitive domains leads to selective extraction and fluent pro- 
cessing of abstract relations, such as slope. From transactions with individual 
cases, learners come to zero in on the properties, including abstract relations, 
that underlie important classifications. The process is PL, as it changes the way 
information is extracted. The learning is highly selective; selection is so funda- 
mental to PL that Gibson (1969) used “differentiation learning” as a synonym 
for PL. Finally, the properties learned are abstract. Whether in chess, speech 
recognition, chemistry, or mathematics, PL often leads to selective, fluent 
extraction of relational and abstract information (Kellman & Garrigan, 2009) . 

Traditional views of perception and recent proposals regarding percep- 
tion and cognition, such as PSS, do not appear to offer reasonable ways 
of understanding these aspects of PL. How can we understand them? To 
begin with, the answer can be found in a better understanding of percep- 
tion itself. 

4.4. The Amodal, Abstract Character of Perception 

Both the classical view of perception and recent attempts to connect per- 
ception and cognition are hampered by a failure to understand the amodal, 
abstract character of perception. 
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Research and theory in perception over the past several decades have 
made it clear that perceptual systems are sensitive to complex relations 
in stimulation as their inputs, and they produce meaningful descriptions 
of objects, spatial layouts, and events occurring in the world (J. Gibson, 
1966, 1979; Johansson, 1970; Marr, 1982; Scholl &Tremoulet, 2000). Most 
perceptual mechanisms develop from innate foundations or maturational 
programs and do not rely on associative learning to provide meaningful 
perception of structure and events (for a review, see Kellman & Arterberry, 
1998). Many structural concepts that might earlier have been considered 
exclusively cognitive constructs have been shown to be rooted in perceptual 
mechanisms. Some of these include causality (Michotte, 1963), animacy 
(Johansson, 1973), and social intention (Heider & Simmel, 1944; Runeson 
& Frykholm, 1981; SchoU &Tremoulet, 2000). 

These features of perception are difficult to reconcile with a shared 
assumption of classical views, PSS, and some other approaches that the 
products of perceiving are sets of sensory activations that are modality spe- 
cific — that is, unique to particular senses. In PSS, for example, the definition 
of perceptual symbols requires that they be modality specific, consisting of 
records of“feature activations” (Barsalou, 1999, 2003). This approach to rep- 
resentation, according to Barsalou, replaces the amodal symbols common in 
other cognitive modeling, resulting in the view that there may be no truly 
abstract, amodal symbols at all. 

Any approach of this sort is difficult to reconcile with the fact that most 
of the perceptual representations that are central to our thought and action 
have a distinctly nonsensory character. For example, as the Gestalt psycholo- 
gists pointed out almost 100 years ago, the perceived shape of an object is 
something quite different from the collection of sensory elements it acti- 
vates (Koffka, 1935). The problems with obtaining the products of per- 
ception from aggregates of sensory activations are well known (J. Gibson, 
1979; Koffka, 1935; KeUman & Arterberry, 1998; Landau, 1999; Marr, 1982; 
Nanay, 2010; Ohlsson, 1999). 

The solution of how to connect perception with abstract thought is 
not that abstract thought consists of simulations of sensory feature activa- 
tions but that perception itself is amodal and abstract. The terms “modal” 
and “amodal” were in fact introduced in perception research by Michotte, 
Thines, and Crabbe (1964) with regard to these issues. Michotte et al. use 
both modal and amodal to refer to perceptual phenomena. In his classic 
work on visual completion, modal completion refers to cases in which 
the visual system fills in information that includes sensory properties, such 
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as brightness and color, whereas amodal completion refers to filling-in 
in which the object structure is represented perceptually but there is an 
absence of sensory properties. (The latter happens, for example, when an 
object is seen as continuing behind a nearer occluding object.) Michotte’s 
view, supported by extensive research, is that both kinds of fiUing-in are 
accomplished by perceptual mechanisms, not processes of reasoning or cog- 
nition (Kanizsa, 1979; Keane, Lu, Papathomas, Silverstein, & KeUman, 2012; 
Michotte et ah, 1964). In fact, both kinds of fiUing-in appear to depend 
on the same perceptual mechanisms (Kellman & Shipley, 1991; Kellman, 
Garrigan, & Shipley, 2005; Murray, Foxe, Javitt, & Foxe, 2004). In general, 
visual perception of ordinary surfaces and objects results in representa- 
tions of complete objects and continuous surfaces, even when many parts 
of these are not represented in local sensory input due to occlusion or 
camouflage (Kanizsa, 1979; KeUman & Shipley, 1991; Michotte et ah, 1964; 
Palmer, Kellman, & Shipley, 2006) . 

The issue here may be in part terminological. Barsalou (1999, 2003) 
defines perceptual symbols in general as necessarily “modal,” and contrasts 
these with the nonperceptual or“amodal” symbols. His explication of modal 
perceptual symbols includes being the property of a single sense and being 
“analogical,” in that such symbols are “represented in the same systems as 
the perceptual states that produced them. The structure of a perceptual sym- 
bol corresponds, at least somewhat, to the perceptual state that produced it” 
(Barsalou, 1999). One could explore the idea that Barsalou may be giving 
the terms “modal” and “amodal” new meanings and therefore there is no 
conflict with Michotte’s ideas. On this view, anything vision does is “modal” 
because vision is one sense, as distinguished, for example, from audition. The 
nonsensory phenomena of visual object and surface perception, and so on, 
would simply be modal under these new definitions. 

The different use of terms is accompanied by a difference in concept, 
however. The problem is clear in the proposals that perceiving an object 
consists of feature activations, such as neurons for edges, vertices, orienta- 
tion, color, etc., and that “The total pattern of activation over this hierarchi- 
cally organized distributed system represents the entity in vision.” Barsalou’s 
view is in many ways remarkably close to classical views of sensation and 
perception, as he notes (Barsalou, 1999, p. 578). 

In the field of perception, Michotte’s ideas were incorporated into the 
more comprehensive ecological, information-based theories of J.J. Gibson 
(1966, 1979). Gibson made the case that perceptual mechanisms have evolved 
to be sensitive, not to simple, local stimuli, but to higher order relations 
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(invariants) in stimulation that correspond to important environmental prop- 
erties or important events involving the perceiver and the environment. Much 
of the important information is not even present in a particular, momentary 
sensory array (image). For example, variables in optic flow — the continu- 
ously transforming projection of the environment onto the eyes — specify the 
direction of travel of a moving observer, as well as the layout of surfaces ahead 
(Gibson, 1979; Warren & Hannon, 1988). In general, Gibson embraced the 
idea of perception, at least its most functionally important aspects, as “sensa- 
tionless.” 

An example of the extraction of complex relations by perceptual mech- 
anisms to produce descriptions of high-level, abstract properties may help to 
make this idea intuitive. Johansson (1973) placed small lights (“point lights”) 
on the joints of a person, and filmed the person walking in the dark. When 
viewed by a human observer, there is a compelling and automatic percept 
of a person walking. Such displays may also convey information about gen- 
der or specific individuals. Many more complex events involving so-called 
biological motion have been shown to be quickly and effortlessly perceived, 
including dancing and jumping. 

Any static view of the dots used in these displays conveys only a mean- 
ingless jumble. Moreover, dot displays, in momentary images or in motion, 
do not at all resemble any stored images (or sets of feature activations) we 
may have of actual walking (or dancing) persons. All the basic sensory fea- 
tures in these displays are, upon first presentation, brand new. Moreover, the 
observer represents perceptually a walking person and encodes in a durable 
fashion almost nothing about positions of particular dots in momentary 
images, or dot trajectories, that comprised the animation sequence. The fact 
that observers uniformly and automatically perceive meaningful persons and 
events in these displays indicates that our normal encoding of persons and 
events in the environment includes abstract relations of high complexity.^ 
AH these observations illustrate crucial and general aspects of perception We 
do register sensory elements (and feature activations), but we do so as part 
of processes that extract complex and abstract relations relevant to detecting 
ecologically important properties of objects and events. It is these properties 
that are encoded; the basic sensory features are transient, quickly discarded, 
and, apart from the relations in which they participate, quite irrelevant to 
perception. These ideas that perceptual systems utilize complex relational 

^ They are complex enough that scientists who study computational vision have not yet been able 
to produce algorithms to approximate human performance in perceiving structure from point-light 
displays. 
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information as inputs and produce abstract, amodal representations as out- 
puts are shared by virtually all contemporary ecological and computational 
work in perception (Hochberg, 1968; KeUman & Arterberry, 1998; Marr, 
1982; Shepard, 1984; Pizlo, 2010) and are not subjects of serious dispute. 

We should note specifically that this view of the outputs of perception 
as amodal, meaningful abstractions applies even to seemingly simple cases 
of perception. The idea that we could represent some object in the world, 
say, a car, in terms of sets of feature detectors activated in sensory areas, 
constitutes a vast and misleading simplification. It is true that early cortical 
areas in the visual system contain orientation-sensitive units that respond to 
retinaUy local areas of oriented contrast. So it may seem straightforward to 
assume that activations of such cells could represent the oriented edges of a 
car that we see. But this is a misunderstanding. The perceived orientation of 
an edge of a car in the world is actually the result of complex computations 
accomplished by perceptual mechanisms; it is not a readout of the outputs 
of early orientation-sensitive units. One reason is that capturing informa- 
tion about an edge in the world requires utilizing relations among many 
different orientation-sensitive units of different local orientations and scales 
(e.g. Lamme & Roelfsema, 2000; Sanada & Ohzawa, 2006;Wurtz & Lou- 
rens, 2000) . Another problem is that the early neural units in vision encode 
two-dimensional orientations on the retina, not the three-dimensional ori- 
entations in space needed in our perceptual representations (for discussion, 
see KeUman et ah, 2005). The most general version of the problem here, 
however, is that the word “orientation” means different things for the “fea- 
ture detectors” of the basic vision scientist and the object “features” needed 
in cognitive models. The former are invariably retinal, meaning that the ori- 
entation-sensitive units in VI that get activated depend on the orientation 
and position of contrast on the retina of the eye. This position and orienta- 
tion information typicaUy changes several times a second,^ as it depends 
cruciaUy on the position of the eyes in the head, the head on the body, and 
the body in the world. Thus, the correspondence between the orientation 
of an edge in the world and which orientation-sensitive units are firing in 
the brain is haphazard. Complex relations in the activities of orientation- 
sensitive units aUow us to encode properties of objects in a world-centered 
coordinate system, but there is no reason to believe that we encode into any 

^ Even identifying an early cortical unit with a single retinal orientation is an oversimplification. In 
fact, early cortical units in vision have complex response profiles that include changes in their ori- 
entation sensitivity over periods < 100ms, and they are sensitive to many other influences of context 
(Lamme & Roelfsema, 2000; Ringach, Hawken, & Shapley, 2003). 
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lasting form of storage any sensory records of the momentary activations of 
neurons in early cortical areas. In fact, there are reasons to believe that early 
activations of“feature detectors” in early visual processing are not accessible 
to even momentary conscious awareness (e.g. Crick & Koch, 1995; He & 
MacLeod, 2001) and cannot be accessed by learning mechanisms (Garrigan 
& KeUman, 2008). In short, the perception of a simple environmental prop- 
erty, such as the edge of a car, is a complex abstraction, based on relational 
information; the relations of this abstraction to the outputs of populations 
of detectors, such as the orientations signaled by neuron in early visual areas, 
are highly variable; and the latter are not preserved in any accessible outputs 
of the process. Elementary activations in sensory areas are not the elements 
of perceptual representations — not even the seemingly simple ones, such as 
orientation or color (see Zeki, Aglioti, McKeefry, and Berlucchi (1999) for 
a parallel argument regarding color). We would go so far as to say that the 
term “feature detector” has proven to be an unfortunate choice in sensory 
neuroscience. When construed to mean that early neural units signal the 
features of objects, surfaces or events in the world, it is a misunderstanding. 

The transition from a view of perceptual representations as some kind 
of energy imprint on the sensory surfaces to a view of these representations 
as amodal and abstract parallels the developments in other sciences. Com- 
menting on the ways in which quantum mechanics had changed concep- 
tions of matter from continuous and concrete to something much more 
abstract, the philosopher Bertrand Russell put it: “It has begun to seem that 
matter, like the Cheshire Cat, is becoming gradually diaphanous and noth- 
ing is left but the grin, caused, presumably, by amusement at those who still 
think it is there.” The lingering view of perception, and the representations 
gotten from perception, as being fundamentally about local sensory activa- 
tions is just like this. In contemporary views, sensory activations provide a 
medium from which perceptual mechanisms extract informative relations 
in order to represent in abstract fashion ecologically important structures of 
objects and events. The sensory Cheshire cat has proved similarly diapha- 
nous, leaving nothing but the grin. (Even the grin, if we recall it from hav- 
ing seen a picture earlier, is an abstract structure, not an array of sensations) . 

4.5. The Selective Character of Perception 

Earlier, we noted that selection is a key principle in PL. It is a crucial char- 
acteristic, because organisms are surrounded at any moment by a wealth of 
stimulation. The tasks they need to perform require highly selected subsets 
of this information, and sometimes require discovery of complex, subtle. 
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and abstract properties and relations. Moreover, we have limited immedi- 
ate processing capacities, such that cognitive load is a major constraint on 
performance in most tasks, and conspicuously so in learning (Paas & van 
Merrienboer, 1994; Schneider & Shiffrin, 1977; SweUer & Chandler, 1991). 
Selective apprehension of information and improvements in fluency (speed, 
chunking, and automaticity or reduced load) with practice are both primary 
mechanisms by which humans cope with these limitations and major deter- 
minants of expertise in most domains. 

As we noted above, information selection in classical views would be 
hard to accomplish because pickup was based on sensations, not informa- 
tion. Both fashioning abstract ideas out of associations of sensations and 
altering the information extraction process with experience are hard to 
fathom from this starting point. The situation is different but equally prob- 
lematic from the PSS perspective. Again, if records of feature activations 
for whole episodes are what is picked up and what is stored, selection and 
isolation of invariants or distinguishing features pose an unsolved problem. 

Fortunately, the problem is much more easily handled in contemporary 
views of perception, in which selection plays an important and intrinsic role. 
As we have seen, selective computation of abstract properties, from simple 
ones such as edge orientation in space, to more complex ones such as shape 
or sets of motion relationships that specify objects, surfaces or events, is a 
fundamental characteristic of perception (J. Gibson, 1979) and appears to 
be presumed by learning mechanisms as well (Garrigan & Kellman, 2008) . 

4.6. Common Amodal Perceptual Representations for 
Thought, Action, and Learning 

What is the format of perceptual representations? A holdover from tradi- 
tional theories is that information that comes in through sight is encoded in 
a visual representation, information gotten through hearing is encoded in 
an auditory representation, and so on. Products of different senses, if stored 
in separate encodings, would have to be subject to endless associations and 
calibrations to achieve even the simplest results in representing the world. 
When you perceive a bird that both squawks and flaps its wings, your brain 
would require complicated transactions to relate the squawking in the audi- 
tory world to the flapping in the visual world. 

The idea that perceptual information must be saddled with the frag- 
mentation of a separate visual world, an auditory world, a tactile world, etc. 
came originally from the obvious fact that we use different sense organs 
to pick up information, the unique sensations that characterize each sense. 
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and from the assumption that the contents of perception were aggregates of 
these sensations. Such an account leads irrevocably to the idea of separate 
representations in the separate senses, along with the need for associative or 
inference processes that have to be used to connect them. 

If, as is now recognized, separate sensory input channels furnish more 
abstract information about structure in the world, it would make sense that, 
at least to some degree, these outputs converge into a common represen- 
tation. Unlike the ideas that perception is amodal and information based, 
this idea is not yet a consensus view; it is still common for researchers 
to discuss multisensory integration or amodal representations in distinct 
senses (Nanay, 2010; Pouget, Deneve, & Duhamel, 2002).Yet, along with the 
obvious functional utility of having perceptual descriptions in a common 
amodal representation, there is now considerable evidence for early and 
intrinsic connections across the senses (Falchier, Clavagnier, Barone, & Ken- 
nedy, 2002; Knudsen & Knudsen, 1985; Meltzoff & Moore, 1989; Spelke, 
1987; Stein & Meredith, 1993;Wertheimer, 1961). 

Some neurophysiological evidence directly implicates encoding of 
amodal properties, such as location, apart from particular sensory chan- 
nels. Knudsen (1982) discovered cells in the optic tectum of barn owls 
that respond to locations in space, whether specified auditorily or visually. 
This is direct evidence for a system encoding information about space and 
time, into which sensory channels feed, rather than a set of separate sensory 
representations. Much recent work in a variety of mammalian species also 
suggests that the brain is wired to connect the sensory input channels much 
earlier than was previously understood. Even early cortical areas, such as 
VI and Al, that have been considered exclusively involved with one sense, 
have been shown to have multisensory influences (Falchier et al., 2002; 
Ghazanfar & Schroeder, 2006; Stein & Meredith, 1993). Stein and Stanford 
(2008, p. 263), in reviewing an extensive neurophysiological literature, con- 
clude that .evidence of early multisensory convergence raises fundamen- 

tal questions about the sensory-specific organization of the cortex” and 
“These observations question whether there are any exclusive, modality- 
specific cortical regions and, thus, whether it is worth retaining designations 
that imply such exclusivity.” 

A wide variety of evidence and argument supports the idea that to sup- 
port learning, thought and action, perceptual descriptions must involve 
a common, amodal representation, rather than merely modality-specific 
records (Ernst & Banks, 2002; Klatzky, Wu, & Stetten, 2008; Lehar, 1999; 
Stoffregen & Bardy, 2001). 
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4.6. 1. Embodied Cognition 

Another important question is whether these representations must 
always be tied to actions. The idea of embodied cognition is a relatively 
recent set of ideas that suggests a close relationship between perception, 
cognition, and action. Like Barsalou’s PSS approach, embodied cogni- 
tion views tend to deny the idea of abstract cognitive representations 
separate from episodes of perceiving and acting. Thelen (2000) expresses 
the idea this way: 

To say that cognition is embodied means that it arises from bodiiy interactions 
with the world and Is continually meshed with them. From this point of view, 
therefore, cognition depends on the kinds of experiences that come from having a 
body with particular perceptual and motor capabilities ...(p. 5) 

One issue in evaluating embodied cognition views is that there are a variety 
of them. Wilson (2002) has identified at least six possible basic claims of 
embodied cognition. They are: 

1 . Cognition is situated. Cognitive activity takes place in the context of 
a real-world environment, and it inherently involves perception and 
action. 

2 . Cognition is time pressured. We are “mind on the hoof” (Clark, 1997), 
and cognition must be understood in terms of how it functions under 
the pressures of real-time interaction with the environment. 

3 . We off-load cognitive work onto the environment. Because of limits 
on our information-processing abilities (e.g. limits on attention and 
working memory), we exploit the environment to reduce the cog- 
nitive workload. We make the environment hold or even manipu- 
late information for us, and we harvest that information only on a 
need-to-know basis. 

4 . The environment is part of the cognitive system. The information 
flow between mind and world is so dense and continuous that, for 
scientists studying the nature of cognitive activity, the mind alone is 
not a meaningful unit of analysis. 

5 . Cognition is for action. The function of the mind is to guide action, 
and cognitive mechanisms such as perception and memory must be 
understood in terms of their ultimate contribution to situation-appro- 
priate behavior. 

6. Off-line cognition is body based. Even when decoupled from the 
environment, the activity of the mind is grounded in mechanisms that 
evolved for interaction with the environment — that is, mechanisms of 
sensory processing and motor control. 
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Claims 1 and 6 are perhaps the most significant for understanding represen- 
tations, as weU as the relation of PL to high-level cognitive tasks. We do not 
attempt any comprehensive analysis here, but limit ourselves to extending 
the important points derived earlier for understanding PL and cognition. 

If the idea of embodied cognition is taken to mean that we do not have 
any abstract representations, able to be processed separately from the execu- 
tion of actions, it is probably incorrect, and it would fail to allow a reason- 
able account of PL effects in high-level domains, for much the same reasons 
that plague the PSS and classical accounts. Specifically, the selective, abstract, 
and amodal properties of perceptual representations — the same ones that 
make the products of perception and PL most useful for complex cognitive 
tasks — preclude too close a coupling of PL with specific actions. As we will 
see below, evidence from PL interventions in high-level cognitive domains 
suggests that when learners come to apprehend important structures, this 
learning may improve their performance on a variety of tasks, including 
remote transfer tasks. Structure may be learned and used apart from specific 
actions. PL phenomena of this sort remind us of the classic work in animal 
learning indicating that stored representations obtained from perception 
can be used flexibly for different actions (Tolman, 1948), and, if we can add 
an update, for thinking. Binding perceptual representations too closely to 
specific actions would be problematic for reasons analogous to PSS ideas, 
where rerunning segments of prior perceiving episodes, complete with sen- 
sory activations, would seem to impede the extraction of abstract invariance 
detectable in new contexts. Just as Tolman argued for representations that 
could not be explained as stimulus— response pairings, embodiment con- 
sisting of a necessary connection between perceptual representations and 
specific actions would fail to provide a reasonable account of perception 
or PL. That said, many versions of embodied cognition, including most of 
the claims above, do not mandate such an extreme connection. Indeed, the 
general idea that advances in understanding may emerge from considering 
connections among perception, action, and thought is an idea with which 
we sympathize. For example, our argument regarding the use of spatial rep- 
resentations in symbolic domains such as mathematics might be considered 
to be related to several of the six claims considered by Wilson (2002) . 

4.7. Implications for Perceptual Learning 

The classical view came with its own view of PL, because all of perception, 
as opposed to registration of raw sensations, was, in fact, associative learning. 
This view has been superseded by a generation of direct evidence about 
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perceptual development, indicating that perceptual systems deliver ecologi- 
cally meaningful descriptions, even from birth (BushneU, Sai, & Mullin, 
1989; Held, 1985; Kellman & Spelke, 1983; Meltzoff & Moore, 1977; Slater, 
Mattock, & Brown, 1990; Walk & Gibson, 1961; for a review, see Kellman 
& Arterberry, 1998). The classic perceptual learning burden of constructing 
meaningful reality from associating sensations is obviated by an improved 
picture of early perception. 

The revised view of perception as sensitive to information about impor- 
tant environmental properties comes with its own mandate for PL, however. 
An observer at any time is surrounded by a wealth of meaningful informa- 
tion about objects, surfaces, and events. There are an unlimited number of 
environmental features and relations that could be important for differ- 
ent tasks. Processes of learning serve to optimize performance of particular 
tasks by discovering which information is relevant to them, refining and 
attuning perceptual mechanisms to selectively extract this information, and 
automating that extraction (E. Gibson, 1969; Kellman & Garrigan, 2009). 
This kind of PL — that makes perceivers better at discovering and extracting 
currently available information — is the prevailing notion of PL in contem- 
porary research. 

Taken together, contemporary views of perception and PL provide clear 
foundations for beginning to understand and explore the role of PL in 
high-level cognitive tasks. The properties of perception that figure promi- 
nently are these: Perceptual representations are amodal, abstract, and selec- 
tive. These are the properties that allow them to be functionally useful in 
thought and action. Extraction of complex relations connects directly to 
high-level thinking and underwrites action. If perceptual representations 
were not in a form that connects to capacities to reason, imagine, and plan, 
it would be hard to see their point. The synergistic relationship between 
extraction of important structure and thinking propels learning and the 
development of expertise. 


> > 5. PERCEPTUAL LEARNING AND INSTRUCTION 

^ Mere mention of the word “instruction” evokes an image of teacher 
speaking to a class. Our ordinary intuitions about teaching and learning in 
formal settings, as well as most learning research, appear to be colored by a 
stereotype about what learning is and how it works. Bereiter and Scardama- 
lia (1998) described this stereotype as a “folk psychology” view of learning, 
specifically, what they termed the “container metaphor”: 
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Knowledge is most readily conceived of as specifiable objects in the mind, such as 
discrete facts, beliefs, ideas. . . (Learning) . . . Involves retaining and retrieving such 
objects. (Bereiter&Scardamalia, 1998, p 487). 

As we have seen, PL encompasses much that falls outside of this view of 
learning. Bereiter and Scardamalia contrasted with the conventional “mind 
as container” view a different idea: “mind as pattern recognizer.” PL is the 
type of learning that leads to mind as pattern recognizer. 

That changes in the way information is extracted are important to 
expertise has been frequently documented. De Groot (1965), himself a 
chess master, studied chess players, with the expectation that master level 
players considered more possible moves and countermoves or in some sense 
thought more deeply about strategy. Instead, he found that their superior- 
ity was shown primarily on the perceptual side. Masters had become able 
to extract meaningful patterns in larger chunks, with greater speed and less 
effort than less skilled players. De Groot (1965) suggested that this profile is 
a hallmark of human expertise in many domains: 

We know that increasing experience and knowledge In a specific field (chess, for 
instance) has the effect that things (properties, etc.) which, at earlier stages, had to 
be abstracted, or even inferred are apt to be immediately perceived at later stages. 

To a rather large extent, abstraction is replaced by perception, but we do not know 
much about how this works, nor where the borderline lies. (pp. 33-34) 

Similar differences between experts and novices have since been found in 
research on expertise in a variety of domains, such as science problem solving 
(Chi, Feltovich, & Glaser, 1981; Simon, 2001), radiology (Kundel & Nodine, 
1975; Lesgold, Rubinson, Feltovich, Glaser, & Klopfer, 1988), electronics 
(Egan & Schwartz, 1979), and mathematics (Robinson & Hayes, 1978). An 
influential review of learning and its relation to education (Bransford, Brown, 
& Cocking, 1999) summed it up this way: 

Experts are not simply "general problem solvers" who have learned a set of 
strategies that operate across all domains. The fact that experts are more likely 
than novices to recognize meaningful patterns of information applies In all 
domains, whether chess, electronics, mathematics, or classroom teaching. In De 
Groot's ( 1 965) words, a "given" problem situation Is not really a given. Because of 
their ability to see patterns of meaningful information, experts begin problem 
solving at "a higher place" (DeGroot, 1 965). (p. 48) 


5.1. Natural Kind Learning 

It is interesting that school learning centers so heavily on explicit ver- 
bal instruction about facts and procedures, given that more implicit PL 
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processes appear to dominate the prodigious learning accomplishments of 
children in the years before they reach school age. Much of early learning 
may be characterized as discovery processes in PL, and these include appre- 
hension of highly abstract relations, even very early on (Marcus, Vijayan, 
Bandi Rao &Vishton, 1999). Natural kind learning exemplifies some of 
the most interesting and powerful characteristics of this kind of learning 
and transfer. Imagine a young child going for a walk with her father. Upon 
seeing a dog, the child points, and her father says “That’s a dog.” Suppose 
this particular dog is a small white poodle. On some other day, the child 
sees another dog — this one a large black Labrador retriever. Again, someone 
says “dog.” And so on. With each instance, something about a particular dog 
(along with the label “dog”) is encoded. As the process continues, and a 
number of instances (probably not a particularly large number) have been 
encountered, the child becomes able to look at a new, never before seen dog 
and say “dog.”This is the magical part, as each new dog will differ in various 
ways from any of the examples previously encountered. Moreover, the child 
is concurrently coming to distinguish correctly novel instances of dog, cat, 
squirrel, etc., from each other. A particular cat or squirrel may have proper- 
ties that resemble some known dog; a small black dog and a large black cat 
are more similar in color and size than are a large black and small white 
dog. Despite similarities of instances across different classes and differences 
among instances within classes, the learner comes to extract properties suf- 
ficient to classify novel instances accurately. Much of the relevant PL would 
seem to require discovery of abstract relations, as simple features, such as 
color, are seldom the crucial determinants. Shape variables are often impor- 
tant, such as the differing jaw or body structures of dogs and cats. Shape 
variables are highly relational and abstract, rather than tied to particular 
colors, sizes, and contexts, which is what allows those who have undergone 
this kind of learning to effortlessly recognize a glass tabletop ornament as a 
dog versus a cat. 

The properties underlying a classification can be complex and implicit. 
If a child, or even an adult, is asked to state a set of rules that would allow 
a novice to distinguish dogs, cats, and wolves, they cannot ordinarily do so. 
Even the hypothesis about jaw and body structure of dogs and cats, men- 
tioned in the previous paragraph, is a conjecture the authors have generated 
from poring over examples. For adults, even learning researchers, knowing 
cat versus dog when one sees them is easy, but furnishing an account in 
declarative knowledge or a diagnostic procedure is hard, and it is not a typi- 
cal accompaniment of the ability to recognize. 
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Nor do the toddler’s striking feats of natural learning occur from being 
given lectures on the distinguishing features of dogs or cats. Rather, structure 
is extracted from encountering instances and receiving category feedback. 
Such PL processes are crucial not only for developing understanding of the 
objects and events in the world; they also play a pivotal role in language 
acquisition, at multiple levels. Concepts like noun, verb, adverb, and prepo- 
sition are taxing enough when taught explicitly in middle school. How 
is it that these abstract classes are extracted and used in language acquisi- 
tion, allowing grammatical structures to be processed (e.g. Hirsh-Pasek & 
Golinkoff 1997) and facilitating the learning of new words? At a different 
level, learning may be involved in the ability of the young language learner 
to detect invariance in the structure of speech signals across different speak- 
ers. Evidence suggests that the PL processes needed for these achievements, 
including clear cases of abstract PL, are present relatively early in infancy 
(Gomez & Gerken, 1999; Marcus et ah, 1999; Saffran, Loman, & Robertson, 
2000 ). 

5.2. Relations among Types of Learning: Toward a 
"Fundamental Theorem of Learning" 

When a child starts school or other formal learning, the focus of most 
instructional efforts, as it has been in most research on instruction, is on 
declarative and procedural activities. This emphasis can be seen, in part, 
as fitting with important patterns that scientists have discovered regard- 
ing human cognitive development. Before a certain age, the introduction 
of formal concepts and reasoning is likely to be pointless (NRG, 2001; 
Piaget, 1954). 

Conversely, PL is among types of learning that seem to operate from 
the beginning of life, and it plays an important role in natural kind learn- 
ing, language acquisition, and transactions with many kinds of objects and 
events. When a child has reached school age, it might be assumed that with 
those foundations already in place, “higher” cognitive activities — encom- 
passing explicit facts, concepts, procedures, and thinking — take center 
stage. 

We believe it would be a misunderstanding, however, to believe that 
when more explicit aspects of learning are introduced, the PL components 
of learning fade into the background. Although we do not attribute this 
view explicitly to anyone, it may be natural to assume that by school age, 
perceptual transactions with the environment have been largely mastered 
or that they operate in a relatively steady-state fashion. A related point may 
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be made about the content of thought and learning. Theories of cognitive 
development have tended to be saturated with classical views of perception 
(KeUman & Arterberry, 1998); thus, in Piaget’s views, and subsequent related 
views, the early role of the senses is in associative “sensorimotor” transac- 
tions. Conceptual inputs in various domains must operate to make percep- 
tual data useful for abstract knowledge (e.g. Leslie, 1995; Mandler, 1988, 
1992; Piaget, 1952, 1954). Research that has produced a radically different 
understanding of the perceptual starting points of development changes 
this picture and has profound implications for cognitive development, 
which have been discussed elsewhere (e.g. Jones & Smith, 1993; KeUman & 
Arterberry, 1998). In the present context, the crucial consequences of the 
contemporary understanding of perception as delivering abstract structural 
knowledge are that 1) the perceptual part of learning remains important 
in most or aU learning domains, and 2) the products of perception are not 
static or previously mastered, but are dynamicaUy changing as an important 
part of learning in any domain. 

Perhaps most interesting and important, the changes in perceptual 
pickup and the use of declarative and procedural knowledge and reasoning 
should not be considered unrelated aspects of task performance. There is a 
crucial, interactive relationship between these, one that parallels the close 
coupling of perception and action (J. Gibson, 1966, 1979). Although it has 
seldom been emphasized in learning research, PL processes — that attune 
the encoding, classification, discrimination, or recognition of incoming 
information — bear a pervasive relationship to the better-known declara- 
tive and procedural aspects of learning. Only half jokingly, we call this the 
“Fundamental Theorem of Learning.” It states that 

All effective use of declarative and procedural learning presupposes pattern 

recognition. 

Suppose a learner in some domain has acquired a vast array of facts, con- 
cepts, and procedures. How are these deployed? How do they lead to 
effective problem solving in new situations as they arise? Randomly pro- 
ducing facts and procedures is at best inefficient and at worst pathological. 
Obviously, facts and procedures must be used selectively and appropriately. 
Accomplishing appropriate selection depends on accurate classification of 
problems or situations. When one is confronted with a new problem or 
situation, which facts apply? Which procedures are relevant? Fundamentally, 
these are questions of encoding and classifying the input; they require rec- 
ognizing, amidst irrelevant detail, the structural patterns that matter. They 
are pattern recognition problems. 
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Becoming able to see what matters in a given situation has long been 
regarded as the essence of meaningful learning and creative problem solv- 
ing (c.f. Duncker, 1945; Wertheimer, 1959). What has often been missing 
from the discussions of the role of seeing in problem solving is the learning 
process that allows the learners to become able to recognize, in new situa- 
tions, the meaningful structures that matter and to distinguish the relevant 
from the irrelevant. This is the role of PL, and our statement of this “funda- 
mental theorem” is simply a reminder that even in high-level learning tasks 
and domains, processes that advance encoding, discrimination, classification, 
and structure recognition allow facts, concepts and procedures to be used 
effectively. 

5.3. Perceptual Learning Technology 

Modeling PL is a complicated and unfinished effort (Ahissar & Hochstein, 
2004; Fahle & Poggio, 2002; KeUman & Garrigan, 2009; Petrov et ah, 2005). 
This is especially true for perceptual classifications that are based on abstract 
relations (KeUman, Burke, & Hummel, 1999; for discussion, see Kellman 
& Garrigan, 2009). There are currently relatively few models that even 
purport to discover abstract relationships that govern a classification, even 
in restricted domains. Improving our understanding of such abilities wiU 
be valuable for many scientific and technological reasons. For example, in 
computer vision and artificial inteUigence, we stiU lack systems that can 
learn to recognize cats in ordinary scenes, much less learn to classify a glass 
table ornament as a cat, and we are far away from being able to extract even 
more abstract regularities, such as when a tone of voice conveys sarcasm. 

Fortunately, the task of understanding the conditions under which PL 
occurs and the variables that affect it is a much more tractable one than 
developing models of high-level PL. Understanding the principles of PL is 
an active area of research (e.g. Ahissar & Hochstein, 2004; Mettler & Kellman, 
2010; Seitz & Watanabe, 2005; Zhang et ah, 2010). 

Some efforts have focused on complex, real-world tasks, attempting to 
systematicaUy address PL and accelerate the growth of perceptual expertise 
in instructional settings. These efforts have already produced remarkably 
successful outcomes in a variety of learning domains. 

KeUman and Kaiser (1994) developed PLMs to address difficult prob- 
lems in aviation training. In a Visual Navigation PLM, pilots learned navi- 
gational skiUs by mapping, on short, speeded trials, videotaped segments of 
out-of-the-cockpit views onto locations shown on standard visual naviga- 
tion (Visual Flight Rules sectional) charts. Remarkable improvements in 
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accuracy and speed occurred in less than an hour of training, even among 
experienced aviators. In an Instrument Flight PLM, the focus was on flight 
instrument interpretation. On short speeded trials, pilots classified aircraft 
attitude (e.g. climbing, turning) from an array of primary flight displays 
used by pilots to fly in instrument conditions. They found that under an 
hour of training allowed novices to process configurations more quickly 
than and just as accurately as civil aviators who had on average 1000 h of 
flight time (but who had not used the PLM) . When experienced pilots 
used the PLM, they also showed substantial gains, paring 60% off the time 
needed to interpret instrument configurations. 

PL interventions to address speech and language difficulties have been 
shown to produce benefits (Merzenich et ah, 1996;TaUal, Merzenich, Miller, 
& Jenkins, 1998). For example, TaUal et al. showed that auditory discrimina- 
tion training in language learning using specially enhanced and extended 
speech signals improved both auditory discrimination performance and 
speech comprehension in language-impaired children. 

Applications in medical and surgical training illustrate the value of PL in 
addressing dimensions of learning not encompassed by ordinary instruction. 
Guerlain et al. (2004) applied PLM concepts to address issues of anatomic 
recognition in laparoscopic procedures. They found that a computer-based 
PLM approach patterned after the work of Kellman and Kaiser (1994) pro- 
duced better performance than traditional approaches. The training group 
presented with variation in instances selected to encourage learning of 
underlying invariance later showed improvement on perceptual and pro- 
cedural measures, whereas a control group who saw similar displays but 
without the structured PLM did not. Their data implicated PL as the source 
of the improvement, as neither group advanced on strategic or declarative 
knowledge tests. 

More recently, Krasne et al. (submitted) developed and tested two com- 
puter-based perceptual/ adaptive learning modules (PALMs) in the preclerk- 
ship curriculum for all first- and second-year medical students at the UCLA 
School of Medicine. One module focused on pathologic processes in skin 
histology images {Histopathology PALM) and the other for identifying skin- 
lesion morphologies {Dermatology PALM). The goal was to assess students’ 
ability to develop pattern recognition and discrimination skills leading to 
accuracy and fluency in diagnosing new instances of disease-related patterns. 
These were short learning interventions, with objective learning criteria 
typically achieved in 15—35 min. Results indicated strong learning gains in 
accurately classifying previously unseen cases, elevating students’ performance 
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in both the first and second years of medical school well beyond the levels 
attained from conventional instruction alone. There were strong gains in both 
accuracy and fluency; besides becoming more accurate, learners averaged a 
53% reduction in classification time across both years and PALMs. Effect 
sizes averaged in the 1.0— 1.5 range for both accuracy and fluency. These 
results with brief interventions suggest that PL interventions impact aspects 
oflearning that are not well addressed by conventional instruction. They also 
suggest remarkable promise for the use of PL to improve learning in a variety 
of medical and other domains. 

Over the past decade, we have undertaken large-scale, systematic efforts 
to study and apply PL technology in mathematics and science learning 
(KeUman et ah, 2009; Massey et ah, 2011; Silva & Kellman, 1999;Wise et al., 
2000). Although these subjects involve a variety of cognitive processes, they 
rely substantially on pattern recognition and fluent processing of structure, 
as well as mapping across transformations (e.g. in algebra) and across multi- 
ple representations (e.g. graphs and equations). Few instructional activities 
directly address these aspects oflearning, and a variety of indicators suggest 
that they may be disproportionately responsible for students’ difficulties 
in learning (Kellman et al, 2009) . Findings consistently indicate that even 
short PLM interventions can accelerate fluent use of structure in contexts 
such as the mapping between graphs and equations (KeUman et al., 2008; 
Silva & KeUman, 1999), apprehending molecular structure in chemistry 
(RusseU & KeUman, 1998;Wise et al., 2000), processing algebraic transfor- 
mations, and understanding fractions and proportional reasoning (KeUman 
et al., 2008, 2009; Massey et al., 2011). Earlier, we presented the example 
of an Algebraic Transformations PLM. To convey the scope and approach of 
PLMs in mathematics learning, we describe here one other PLM in detail 
and summarize some others. These examples wiU help to illustrate both the 
learning effects from PLMs as weU as their distinctive features as learning 
interventions. 

An illuminating example is a PLM that we developed to help elemen- 
tary students master linear measurement with rulers of varying scales. When 
one considers a standard ruler, it is a rather remarkable device that organizes 
a numerical symbol system in a spatial layout — essentiaUy the positive side 
of a rational number line on a strip of wood or plastic. The continuous 
extent is evenly partitioned into units and marked by numbered hash marks, 
with hash marks of several sizes arranged to indicate different scales layered 
on the same ruler (e.g. half inches, quarter inches, eighth inches). Once one 
has acquired expertise with this instrument, it is a simple matter to “just 
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see” the structure. The inches or quarter inches or centimeters or meters 
are readily perceived as objects that can be manipulated and enumerated in 
various ways to measure linear extents. 

As countless teachers can testify, however, acquiring such insight is 
not a simple or reliable achievement for many elementary or even mid- 
dle school students, despite conscientious instruction. An indication of 
the learning difficulty comes from results on the National Assessment of 
Education Progress (http : //nces . ed. gov/ nationsreportcard/ itmrlsx/ search. 
aspx?subject=mathematics), on a released item known as the broken ruler 
problem. A version of the problem is illustrated in Figure 4.3. A toothpick 
is pictured above a standard 12-inch ruler that has been broken so that the 
left-hand edge starts at 7 inches. The toothpick is positioned so that it starts 
at 8 and ends at 10 14, and students are asked to enter the length of the ruler. 
Alarmingly, only 20% of 4th graders and 58% of 8th graders give the correct 
answer. Of particular interest are the two most common incorrect answers: 
10 14 and 3 14. Students who give the former answer are most likely fol- 
lowing a poorly understood, inflexible procedure that involves reading the 
rightmost endpoint as the length — simply ignoring that the ruler is broken. 
Students who say that the toothpick is 3 14 inches long are probably relying 
on a counting routine and counting the hash marks starting with the left- 
most hash mark as “1.” (It is a common classroom observation that students 
are extremely puzzled as to why the left-most edge of a ruler is 0 rather 
than 1, and why they are told to line things up starting at 0. After aU, when 
counting discrete items, one always starts with one, not zero.) 

Both of these incorrect answers indicate that the students are not per- 
ceiving units on the ruler that have extent. From conventional instruction, 
they have picked up some aspects of measurement facts and procedures, 
but the mapping of what they have learned onto structure in the problem 
is faulty. They do not recognize that an inch (or centimeter, etc.) on a ruler 
is the extent between the hash marks that demarcate the unit, not just the 
point where the numbered hash mark is located. The beauty of the broken 
ruler problem is that it reveals this; students succeed with much higher 
accuracy if they are given an ordinary ruler problem, in which the zero 
point lines up with the left edge of the toothpick. A related and persistent 
difficulty is that students struggle to map fractions to rulers. Difficulties with 
fraction notation aside, if a student does not see an extended unit to begin 
with, he or she will have difficulty identifying the subpartitions of units that 
map to fractional quantities. 


Perceptual Learning, Cognition, and Expertise 


153 



What is the length of the toothpick in the figure above? 


Figure 4.3 The "broken ruler” problem. Released measurement item from the 2003 
National Assessment for Educational Progress (US Department of Education, lES, 
National Center for Education Statistics, http://nces.ed.gov/nationsreportcard/itmrlsx). 


To address this problem of seeing the relevant structure, we devel- 
oped learning software that presents students with many short, interac- 
tive, animated learning trials in which students interact with the key 
structures and relationships underlying linear measurement. A typical 
trial presents the student with a graphic display showing a ball on top of 
a ruler and billiard cue poised to strike it. The student is given either a 
starting point and an ending point and asked to say the distance traveled, 
or they are given a starting point and a traveling distance and are asked 
to say what the endpoint will be. The learning items in the database vary 
with the types of values involved, whether the rulers are fully versus 
partially labeled, and whether they are partitioned in the most economi- 
cal way to solve the problem or are overpartitioned (e.g. a ruler marked 
in units of sixteenths for a problem involving eighths). Movement on 
the ruler can be either rightward or leftward. The quantities involved 
vary from single digits into the hundreds and included both fractions 
and integers. Learners receive immediate animated feedback on each 
trial and are also given periodic feedback on their progress through the 
module. 

Instead of emphasizing verbal explanations or procedural calculations, 
this Linear Measurement PLM concentrates the students’ attention and effort 
on learning to pick up relevant structures and relationships. The items in the 
learning set are designed so that each student sees many varied examples; 
these are conditions in which PL processes come to discover and fluently 
extract important structures in different contexts. In this way, PLMs acceler- 
ate students’ expertise until they are able to “just see” what is important and 
relevant in each problem. 

In a formal study, 63 sixth-grade students in a low performing urban 
middle school completed a pretest and used the Linear Measurement PLM, 
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then took an immediate post-test as well as a delayed post-test a fuU 4 
months later. The 6th graders were compared with a group of forty-nine 
7th and 8th graders in the same school who took the assessment without 
using the module. The assessment, which included many transfer items that 
did not resemble the learning trials, tested children’s ability to use a par- 
titioned number line to express the length of a line segment in generic 
units; to use both conventional and broken rulers to measure lengths in 
inches and centimeters; to use conventional and broken rulers to construct 
extents with varying lengths; to solve addition and subtraction problems 
with fractions; and to solve open-ended word problems involving linear 
measurements. Both the 6th grade intervention students at pretest and the 
older control students scored <50% on the assessment. After completing the 
module, the 6th graders’ scores improved dramatically (Figure 4.4), with 
effect sizes (Cohen’s d) comparing pretest scores versus post-test scores and 
intervention versus control groups ranging from 0.86 to 1.06 (KeUman, 
Massey & Son, 2009; Massey, KeUman, Roth, & Burke, 2011). The stud- 
ies also demonstrated remarkable durability of learning: Scores on delayed 
post-tests conducted 4 months later, with no intervening study activities, 
indicated that the learning gains for the intervention groups were fully 
maintained. 

Other PLM interventions in mathematics learning have produced com- 
parable results. In the area of fractions and measurement, PLMs focusing on 
partitioning and iterating units and mapping equivalent quantities across 
different units not only produced effect sizes in the range of 1.0 to 2.8 but 
led to remote transfer of learning to multiplying and dividing fractions and 
mixed numbers (tasks that were not part of the PLM) . Moreover, as in the 
case of the Linear Measurement PLM described above, the learning gains 
showed no decrements in delayed post-tests administered 4-5 months later. 
Both the remote transfer and durability of the learning highlight important 
characteristics of PL: Becoming able to see relevant structure in a domain 
aUows that structure to be used in varied tasks and comprises an enduring 
kind of learning. 

5.4. Elements of Perceptual Learning in Instruction 

These examples of PLMs in real-world learning contexts illustrate some of 
the conditions that produce PL effects and some of the characteristics of 
learning attainments from these interventions. More generally, what are the 
elements of PL interventions? 
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6th Grade PLM 6th Grade PLM 6th Grade PLM 7th & 8th Grade 
Pretest Immediate 4-Mo. Delayed Control 

Post-test Post-test 

Figure4.4 Results from a Study of the Linear Measurement PLM. Pretest, immediate 
post-test, and delayed post-test scores are shown for the 6th grade intervention group 
compared to 7th and 8th grade students in the same school. The delayed post-test was 
administered after a delay of 4 months, with no intervening study activities. Error bars 
indicate ±1 standard error of the mean (Adapted from Kellman, Massey & Son, TopiCS in 
Cognitive Science, 2009; Cognitive Science Society, Inc., p. 16j. 


Kellman et al. (2009) argued that at least three general properties are 
crucial.^ The most basic requirement is that PL tasks focus on the extraction 
of structure. PLMs involve encoding, discrimination, comparison, and/or 
classification. A contrast in mathematics learning is that PL interventions 
need not involve computation of numerical answers. In PL tasks, the learner 
engages in practice with displays or representations in which success depends 
on the learner coming to attend to, discriminate, classify, or map structure. 
Utilizing structure is of course involved in other types of instruction, but 

^Thi.s section focuses on characteristics that define PL interventions as a distinctive type of learning 
activity. Many more specific features of PLMs, not discussed in detail here, serve to optimize learn- 
ing in this general format and to configure them for particular learning challenges and goals. These 
include issues of sequencing, feedback, variation of positive and negative instances of categories, mix- 
ing of learning tasks, integration of PL activities with conventional instruction, and so on. A number 
of features of PL technology and related adaptive learning technology are covered by US patent 
#7052277 and patents pending, assigned to Insight Learning Technology, Inc. For information, please 
contact the authors or Info@insightlearningtech.com. 


156 


Philip J. Kellman and Christine M. Massey 


a PL task focuses on commonalities and variations in structure as its pri- 
mary learning content. A second characteristic is that PLMs tend to involve 
numerous short classification trials with varied instances. The learner makes 
classifications on these trials and (in most cases) receives feedback. System- 
atic variation across learning instances is crucial, because in most real-world 
tasks, PL involves the discovery of invariance amidst variation (Gibson, 
1969). Discovery processes require sufficient variation for relevant proper- 
ties to be disentangled from irrelevant ones. This aspect of PL interventions 
is most powerful in producing transfer of learning to new situations that 
involve common or related structures. Emphasis on the discovery of invari- 
ance amidst robust variation is crucial in realistic learning tasks, but it differs 
from most contemporary laboratory studies of PL, which typically target 
simple sensory discriminations and involve large numbers of trials with a 
small set of fixed stimuli (e.g. Fahle & Poggio, 2002; for a discussion, see 
Garrigan & KeUman, 2008) . Finally, PL interventions tend to have minimal 
emphasis on explicit instruction. The learning comes from transactions with 
the input, not verbal exchanges. The primary task in a PL intervention does 
not involve verbal or written explanations of facts, concepts, or procedures. 
This is a major difference from conventional instruction, which is domi- 
nated by explicit description (which also addresses important aspects of 
learning) . PL interventions may incorporate explicit introductions or brief 
discussions, but these do not comprise the central learning tasks nor are they 
capable of producing the results obtained with PLMs. 

Another important question is, “How do we know that PL effects are 
occurring from a learning intervention?” In complex tasks and realistic 
learning settings, we have less control over materials and activities than in 
most laboratory situations. Moreover, we would expect, as we have argued 
in this chapter, that PL works synergisticaUy with other processes of learn- 
ing and thinking in domains such as mathematics. Given these background 
conditions, it is unlikely that any intervention in a complex-learning domain 
targets PL uniquely, and it is difficult to claim that any learning gains are 
solely the result of PL. Likewise, although the issue has not received much 
attention, it would be hard to claim that effects produced by other instruc- 
tional interventions do not involve a PL component. PLMs attempt to con- 
dense or accelerate PL, but PL no doubt goes on less systematically in other 
learning situations. 

Synergies aside, there appear to be some characteristic signatures of 
PL effects. KeUman et al. (2009) suggested four of these, summarized 
in Table 4.2.: 1. Qenerativity in structure use. PLMs in complex learning 
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domains are designed to improve pickup and processing of structural 
invariants across variable contexts. A hallmark of successful PL is the 
evidence of accurate and/or fluent classification of novel cases. More- 
over, PLMs often facilitate remote transfer to different-looking prob- 
lem types that involve the same underlying structure. Such transfer is 
a notorious problem in settings using conventional declarative instruc- 
tional approaches. Evidence of accurate and fluent classification of novel 
instances, and transfer to contexts involving different procedural require- 
ments but common structures, provides evidence of PL. 2. Fluency effects. 
PL effects typically include improved fluency of information extraction 
(indicated in measures of speed, parallel processing, or reduced effort 
or cognitive load). Acquisition data within PLMs suggest that fluency 
in information extraction increases gradually across interactive tri- 
als. Gradual improvement is not unique to PL but does contrast with 
some effects of declarative instruction, in which a learner may either 
know or not know a certain concept. A particularly clear case of PL 
effects on fluency can be made when relevant declarative knowledge 
is already present prior to an intervention, and a PL intervention pro- 
duces improved fluency, as in the Algebraic Transformations PLM described 


Table 4.2 Some Possible Signature Effects of Perceptual Learning Interventions. The 
effects shown are common outcomes of PL interventions that tend to distinguish them 
from outcomes of instruction focused on declarative or procedural knowledge (see text). 

Generativity in use of structure 

• Accurate and/or fluent processing of novel cases 

• Improvement on unpracticed tasks that involve learned structures 


Improvements in fluency 

• Faster processing 

• Greater parallel processing 

• Reduced cognitive load or effort 


Implicit pattern recognition versus explicit knowledge 

• Improved performance without new explicit declarative or procedural 
knowledge 


Durability of learning 

• Improved information extraction and structural intuition that persist over 
long delays and are highly resistant to forgetting 


158 


Philip J. Kellman and Christine M. Massey 


earlier. 3. Implicit pattern recognition versus explicit knowledge. Although PL 
may provide important scaffolding for explicit, verbalizable knowledge, 
PL itself need not involve changes in explicit knowledge. PL changes 
the way a learner views a problem or representation, and these changes 
need not be accompanied by new explicit facts, concepts, or procedures. 
Transfer tests routinely indicate this dissociation from PL interventions 
(Guerlain et ah, 2004; Kellman et ah, 2009). 4. Delayed testing effects. Com- 
mon wisdom has it that one never forgets how to ride a bicycle. If true, 
riding a bicycle, a task that clearly involves considerable PL, differs from 
most declarative and procedural learning. It is not by accident that math 
teachers spend the first month of a new school year reviewing con- 
tent from the prior year. Facts and procedures are subject to substantial 
forgetting over a period as long as a summer vacation from school. 
Although more research is needed, there are indications that the improved 
facility in picking up patterns and structure from PL, like riding a bicycle, 
may be less subject to decay over time. In the measurement and fraction 
PLMs described above, we have consistently observed no decrements 
in learning gains when students were tested after 4- to 5-month delays 
(Kellman et ah, 2009; Massey et ah, 2011). 

It is also possible to test directly for PL effects. In domains where the 
central task is clearly focused on classification, such as the Dermatology and 
Histopathology PALMs described above, rapid and accurate classification of 
new instances illustrates straightforwardly that learners have improved in 
the pickup of information. For PL interventions in cognitive domains that 
also involve symbolic material and substantial reasoning components, the 
situation is more complicated in attributing learning gains to specifically 
PL effects. In applying PL technology to such domains, investigators have 
usually had as a first priority showing that PL leads to meaningful learn- 
ing gains, beyond those of conventional instruction. Thus, tests of learning 
and transfer have typically assessed learning on important domain-rele- 
vant tasks; in mathematics PLMs, these have involved tasks such as solving 
algebra problems, performing operations with fractions or measurement, 
or generating a correct graph from an equation or an equation from a 
word problem (Kellman et ah, 2008). However, more basic psychophysical 
tests in complex PL domains are also possible. Thai, Mettler, and Kellman 
(2011) showed that PL interventions, like those we have used in com- 
plex, symbolic domains, produce basic changes in information extrac- 
tion. Participants were trained to classify Chinese characters, based on 
either overall configurations (structures), featural relations (components). 
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or nonrelational information (stroke count), used as a control. PLM par- 
ticipants showed strong domain-relevant learning gains in discriminating 
and classifying Chinese characters. Before and after training, however, they 
were also tested for basic changes in information extraction using a visual 
search task, which had not been part of training. Search displays contained 
all novel exemplars, involved manipulations of target-distractor similarity 
using structures and components, and included heterogeneous and homo- 
geneous distractors. Robust improvements in visual search for structure 
and component PL training were found relative to a control group that 
did not undergo PLM training. These results provide direct evidence that 
high-level PL interventions improve learning by altering extraction of 
information, including changing perceptual sensitivity to important rela- 
tional structures. This study is interesting in connecting a high-level cog- 
nitive task to changes in information pickup detectable by more basic 
psychophysical methods. Further research of this sort may prove useful, 
both in understanding the synergies of various cognitive abilities and in 
optimizing PL interventions. 

Another significant issue for further research is how PL interventions 
might best be combined with other modes of instruction. Acquiring declar- 
ative and procedural knowledge, improving critical thinking, and other 
aspects of learning do not become less important because we are coming 
to understand that neglected PL components are crucial in many learn- 
ing domains. In fact, it seems likely that instructional methods of aU types 
wiU benefit from understanding more clearly these different components of 
learning and their interactions. A discussion of explicit concepts, or a proof, 
may be easier when the teacher knows that the student is correctly mapping 
the words onto problem structure, and a procedure may be better under- 
stood, better remembered, and certainly better applied, when the student 
can see where and why it applies. 


Research in PL offers previously unsuspected synergies with high- 
level cognitive tasks and processes. Through an emerging technology of 
PL, it also promises remarkable potential to improve learning in almost any 
domain, including complex, symbolic ones. To understand and utilize these 
possibilities fuUy requires making clear basic connections between percep- 
tion, cognition, and learning, especially the implications of contemporary 
views of perception as abstract, amodal, and selective. In this chapter, we 
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have tried to describe these connections in ways that allow us to integrate 
and illuminate recent research and applications of PL. We hope these efforts 
contribute to future progress in understanding cognition, perception, and 
learning. 
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